Re: Multibyte support and accented characters - Mailing list pgsql-novice

From M. Bastin
Subject Re: Multibyte support and accented characters
Date
Msg-id a05210608bb0ec00f095b@[213.224.147.214]
Whole thread Raw
In response to Multibyte support and accented characters  (Lynna Landstreet <lynna@gallery44.org>)
Responses Re: Multibyte support and accented characters  (Michael Glaesemann <grzm@myrealbox.com>)
Re: Multibyte support and accented characters  (Lynna Landstreet <lynna@gallery44.org>)
List pgsql-novice
At 7:08 PM -0400 6/12/03, Lynna Landstreet wrote:
Hello all,

Can you handle one more question from me? Not related to keyword checkboxes
this time, promise. :-)

Some of the text that will be entered in the database I'm working on
includes some names and titles in other languages - predominantly French,
but occasionally German, Spanish, etc. So I understand from reading the
PostgreSQL docs that in order to handle this, we need to make sure multibyte
support is enabled.

Now, I am not very clear on the various encodings and how they work. I've
been spoiled by years of working on a Mac where you just type option-e if
you want an acute accent, option-u for an umlaut, etc. That's how most of
the text that will be used to populate the database has been generated. So
my questions are:

1. Which encoding would be best for this? I'm guessing Unicode,

Unicode is the safest way to go indeed.  It's well on its way to become the new common standard of all computer platforms.

 but I'm not
sure. We pretty much only have to deal with western European languages, not
with Russian or Chinese or anything.

2. Once the right one is chosen and enabled, is the process pretty much
transparent - i.e., just enter the text and the accented characters will
come through fine,

No:

CREATE DATABASE mydb WITH ENCODING = 'UNICODE'

Then the front-end, with which you're doing your input, must send its data encoded in unicode UTF-8.  If it sends it in another encoding, then use:

SET CLIENT_ENCODING TO '<whatever encoding the front-end uses>'

to enable automatic translation to unicode by PostgreSQL.

Read the manual for further information: http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html

 or do I have to do something special with them, like the
way they have to be encoded with &...; ASCII codes in HTML?

3. Speaking of HTML, even if PostgreSQL is set up to correctly deal with
accented characters, when the output is displayed on the web, are they going
to have to be converted into &...; form?

Here too you have to tell the browser it's going to receive data in unicode.  I don't know whether you can do this in HTML, or whether the user must choose unicode from the browser's appropriate menu.

Perhaps you can have PostgresQL translate the encoding to iso-latin, the Windows standard.

It's better if someone else answers this one for you.

Marc

pgsql-novice by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Question regarding keyword checkboxes in database
Next
From: Michael Glaesemann
Date:
Subject: Exporting data from PostgreSQL